Competition on Spatial Statistics for Large Datasets
نویسندگان
چکیده
As spatial datasets are becoming increasingly large and unwieldy, exact inference on models becomes computationally prohibitive. Various approximation methods have been proposed to reduce the computational burden. Although comprehensive reviews these exist, comparisons of their performances limited small medium sizes for a few selected methods. To achieve comparison comprising as many possible, we organized Competition Spatial Statistics Large Datasets. This competition had following novel features: (1) generated synthetic with ExaGeoStat software so that number realizations ranged from 100 thousand 1 million; (2) systematically designed data-generating represent processes wide range statistical properties both Gaussian non-Gaussian cases; (3) tasks included estimation prediction, results were assessed by multiple criteria; (4) made all publicly available serve benchmark other In this paper, disclose details along some analysis outcomes.
منابع مشابه
Likelihoods for Large Spatial Datasets
Datasets in the fields of climate and environment are often very large and irregularly spaced. To model such datasets, the widely used Gaussian process models in spatial statistics face tremendous challenges due to the prohibitive computational burden. Various approximation methods have been introduced to reduce the computational cost. However, most of them rely on unrealistic assumptions of th...
متن کاملBayesian Modeling for Large Spatial Datasets.
We focus upon flexible Bayesian hierarchical models for scientific data available at geo-coded locations. Investigators are increasingly turning to spatial process models to analyze such datasets. These models are customarily estimated using Markov Chain Monte Carlo (MCMC) methods, which have become especially popular for spatial modeling, given their flexibility and power to fit models that wo...
متن کاملSparse Density Representations for Simultaneous Inference on Large Spatial Datasets
Large spatial datasets often represent a number of spatial point processes generated by distinct entities or classes of events. When crossed with covariates, such as discrete time buckets, this can quickly result in a data set with millions of individual density estimates. Applications that require simultaneous access to a substantial subset of these estimates become resource constrained when d...
متن کاملCached Sufficient Statistics for Efficient Machine Learning with Large Datasets
This paper introduces new algorithms and data st.ruct,ures for quick rounting for machine learning dat.asets. We focus on t,he counting task of constructing contingent:. t.ables, but our approach is also applicahlc t.o counting the number of records in a dataset that match conjunctive queries. Subject to certain assumptionsl t h c rosts of thesr operations ca,n he shown to be independent of the...
متن کاملCached Suucient Statistics for Eecient Machine Learning with Large Datasets
This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to certain assumptions, the costs of these operations can be shown to be independent of the number of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Agricultural Biological and Environmental Statistics
سال: 2021
ISSN: ['1085-7117', '1537-2693']
DOI: https://doi.org/10.1007/s13253-021-00457-z